Skip to content

Perf #856: throw (not a returning call) at loop-backedge cancel path — closes objects/strings/closures to Node parity#876

Merged
nickna merged 1 commit into
mainfrom
wrk/issue-856-cancel-throw
Jun 21, 2026
Merged

Perf #856: throw (not a returning call) at loop-backedge cancel path — closes objects/strings/closures to Node parity#876
nickna merged 1 commit into
mainfrom
wrk/issue-856-cancel-throw

Conversation

@nickna

@nickna nickna commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Summary

Closes most of the remaining compiled-vs-Node perf gap (epic #856) with a single low-risk codegen change to the loop-backedge cancellation check.

Every compiled loop polls a cooperative-cancellation flag at its backedge so the runner can unwind runaway loops (#74). The flag read is free, but the cold path was call $Runtime.CheckCancellation() — a helper that throws internally. From RyuJIT's flow-graph view that is a returning call, and on SysV x64 every XMM register is caller-saved, so a returning call inside a loop forces the loop-carried doubles (and the counter) to be stack-resident on every iteration — a load/store per use, roughly doubling a tight numeric loop.

Fix: the backedge now emits call $Runtime.BuildCancellationException(); throw. The new factory only constructs the OperationCanceledException; the throw opcode happens at the backedge. Because throw does not return, the loop vars are dead on the cancel path and stay in registers on the hot path. CheckCancellation() is retained for the non-hot-loop sites (event loop, deep-recursion guard).

This supersedes the inline-volatile form (#874), which removed the unconditional call overhead but left the returning call in the loop's flow graph, so the XMM spill remained. A throttle-every-N variant ties it for the same reason. The fix is structural: make the cancel path non-returning.

Why it works (controlled microbench, result*=i loop, Linux x64 / .NET 10)

variant ns/iter
if(flag) call CheckCancellation() (old) 2.09
volatile flag read + branch only, no call 1.15
if(flag) throw Factory() (call factory, throw result) 1.15
no check at all 1.14

The entire penalty was the returning call, not the flag read.

Results — compiled vs Node, min times at largest input

Workload before after
objects 2.52× slower 1.00× — parity
strings 1.26× slower 0.91× — faster than Node
closures 1.13× slower 1.02× — parity
count-primes 1.45× slower 1.13×
factorial 2.27× slower 1.22×
fibonacci / array-methods still 2–2.4× faster

5 of 7 workloads now meet or beat Node; the other two are within ~1.2× and now at the codegen floor (V8's multiply loop is ~0.2 ns/iter tighter; count-primes' residual is List<bool> index-write bounds checks). The change benefits all compiled loops, not just the benchmark suite.

Correctness

  • IL verification passes on loop-heavy output (--compile … --verify).
  • Execute_InfiniteLoop_CancellationUnwindsCooperatively (Test262 runner: compile-mode cooperative cancellation #74 guard) passes — runaway loops still unwind with the same OperationCanceledException.
  • Full suite: 13998 passed. The handful of failures are pre-existing/flaky and unrelated: the non-Test262 ones pass in isolation (parallel-runner contention); the Test262 ones are stale-baseline drift (97–133 "new passes"), and the interpreter baseline shows the same drift despite sharing zero code with this IL-only change.

Files

  • Compilation/EmittedRuntime.csBuildCancellationExceptionMethod field
  • Compilation/RuntimeEmitter.RuntimeClass.cs — emit the factory method
  • Compilation/StatementEmitterBase.csEmitCancellationCheck emits call … BuildCancellationException(); throw
  • STATUS.md — §18 updated

Refs #856, #74. Supersedes the codegen approach in #874.

…path

Every compiled loop polls a cooperative-cancellation flag at its backedge
(#74). The flag read is free, but the cold path was `call CheckCancellation()`
— a helper that throws internally. From RyuJIT's flow-graph view that is a
*returning* call, and on SysV x64 every XMM register is caller-saved, so a
returning call inside a loop forces the loop-carried doubles (and counter) to
be stack-resident on every iteration: a load/store per use, roughly doubling a
tight numeric loop.

Fix: the backedge now emits `call $Runtime.BuildCancellationException(); throw`.
The new factory only *constructs* the OperationCanceledException; the `throw`
opcode happens at the backedge. Because `throw` does not return, the loop vars
are dead on the cancel path and stay in registers on the hot path.
CheckCancellation() is retained for the non-hot-loop sites (event loop,
deep-recursion guard). Cancellation semantics are unchanged — same exception,
same message, thrown at the same point.

Controlled microbench (result*=i loop): `call CheckCancellation()` 2.09 ns/iter
vs `throw Factory()` 1.15 ns/iter (volatile-read-only is also 1.15 — the read
was never the cost). Real benchmarks @largest size, compiled vs Node:
- objects     2.52x slower -> 1.00x (parity)
- strings     1.26x slower -> 0.91x (faster than Node)
- closures    1.13x slower -> 1.02x (parity)
- count-primes 1.45x slower -> 1.13x
- factorial   2.27x slower -> 1.22x
5/7 workloads now meet-or-beat Node; benefits all compiled loops. This
supersedes the inline-volatile form (#874), which removed the unconditional
call overhead but left the returning call in the loop's flow graph.

IL verifies; #74 infinite-loop cancellation test still unwinds.
@nickna nickna merged commit a2d82bd into main Jun 21, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant